Efficient Large-Scale Distributed Training of Conditional Maximum Entropy Models
نویسندگان
چکیده
Training conditional maximum entropy models on massive data sets requires significant computational resources. We examine three common distributed training methods for conditional maxent: a distributed gradient computation method, a majority vote method, and a mixture weight method. We analyze and compare the CPU and network time complexity of each of these methods and present a theoretical analysis of conditional maxent models, including a study of the convergence of the mixture weight method, the most resource-efficient technique. We also report the results of large-scale experiments comparing these three methods which demonstrate the benefits of the mixture weight method: this method consumes less resources, while achieving a performance comparable to that of standard approaches.
منابع مشابه
Efficient sampling and feature selection in whole sentence maximum entropy language models
Conditional Maximum Entropy models have been successfully applied to estimating language model probabilities of the form , but are often too demanding computationally. Furthermore, the conditional framework does not lend itself to expressing global sentential phenomena. We have recently introduced a non-conditional Maximum Entropy language model which directly models the probability of an entir...
متن کاملClosed-Form Training of Conditional Random Fields for Large Scale Image Segmentation
We present LS-CRF, a new method for very efficient large-scale training of Conditional Random Fields (CRFs). It is inspired by existing closed-form expressions for the maximum likelihood parameters of a generative graphical model with tree topology. LS-CRF training requires only solving a set of independent regression problems, for which closed-form expression as well as efficient iterative sol...
متن کاملComputationally Efficient M-Estimation of Log-Linear Structure Models
We describe a new loss function, due to Jeon and Lin (2006), for estimating structured log-linear models on arbitrary features. The loss function can be seen as a (generative) alternative to maximum likelihood estimation with an interesting information-theoretic interpretation, and it is statistically consistent. It is substantially faster than maximum (conditional) likelihood estimation of con...
متن کاملMinimum Entropy Estimation of Hierarchical Random Graph Parameters for Character Recognition
2.1 Entropy and mutual information In this paper, we propose a new parameter estimation method called minimum entropy estimation (MEE), which tries to minimize the conditional entropy of the models given the input data. Since there is no assumption in MEE for the correctness of the parameter space of models, MEE will perform not less than the other estimation methods such as maximum likelihood ...
متن کاملDiscriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms
We describe new algorithms for training tagging models, as an alternative to maximum-entropy models or conditional random fields (CRFs). The algorithms rely on Viterbi decoding of training examples, combined with simple additive updates. We describe theory justifying the algorithms through a modification of the proof of convergence of the perceptron algorithm for classification problems. We giv...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009